A simulation study of sample size for multilevel logistic regression models
نویسندگان
چکیده
BACKGROUND Many studies conducted in health and social sciences collect individual level data as outcome measures. Usually, such data have a hierarchical structure, with patients clustered within physicians, and physicians clustered within practices. Large survey data, including national surveys, have a hierarchical or clustered structure; respondents are naturally clustered in geographical units (e.g., health regions) and may be grouped into smaller units. Outcomes of interest in many fields not only reflect continuous measures, but also binary outcomes such as depression, presence or absence of a disease, and self-reported general health. In the framework of multilevel studies an important problem is calculating an adequate sample size that generates unbiased and accurate estimates. METHODS In this paper simulation studies are used to assess the effect of varying sample size at both the individual and group level on the accuracy of the estimates of the parameters and variance components of multilevel logistic regression models. In addition, the influence of prevalence of the outcome and the intra-class correlation coefficient (ICC) is examined. RESULTS The results show that the estimates of the fixed effect parameters are unbiased for 100 groups with group size of 50 or higher. The estimates of the variance covariance components are slightly biased even with 100 groups and group size of 50. The biases for both fixed and random effects are severe for group size of 5. The standard errors for fixed effect parameters are unbiased while for variance covariance components are underestimated. Results suggest that low prevalent events require larger sample sizes with at least a minimum of 100 groups and 50 individuals per group. CONCLUSION We recommend using a minimum group size of 50 with at least 50 groups to produce valid estimates for multi-level logistic regression models. Group size should be adjusted under conditions where the prevalence of events is low such that the expected number of events in each group should be greater than one.
منابع مشابه
کاربردی از مدل های رگرسیون لجستیک ترتیبی دوسطحی در تعیین عوامل موثر بر بار اقتصادی بیماری دیابت نوع دو در ایران
In recent years, multilevel regression models were intensely developed in many fields like medicine, psychology economic and the others. Such models are applicable for hierarchical data that micro levels are nested in macros. For modeling these data, when response is not normality distributed, we use generalized multilevel regression models. In this paper, at first, multilevel ordinal logist...
متن کاملSample size determination for logistic regression
The problem of sample size estimation is important in medical applications, especially in cases of expensive measurements of immune biomarkers. This paper describes the problem of logistic regression analysis with the sample size determination algorithms, namely the methods of univariate statistics, logistics regression, cross-validation and Bayesian inference. The authors, treating the regr...
متن کاملPenalized Bregman Divergence Estimation via Coordinate Descent
Variable selection via penalized estimation is appealing for dimension reduction. For penalized linear regression, Efron, et al. (2004) introduced the LARS algorithm. Recently, the coordinate descent (CD) algorithm was developed by Friedman, et al. (2007) for penalized linear regression and penalized logistic regression and was shown to gain computational superiority. This paper explores...
متن کاملبکارگیری روش باز نمونه گیری بوت استرپ در رگرسیون لجستیک و کاربرد آن در تحلیل داده های مربوط به بیماران مبتلا به سرطان سینه
Background and Aim: The purpose of this study was to assess the accuracy of the bootstrap method in logistic regression and to explore the method's use in logistic regression models in cases where the sample size is insufficient. Materials and Methods: We use data from 150 patients who had undergone surgery at the Cancer Institute, Emam Khomeini hospital during from 1999 to 2001. Then we drew...
متن کاملSufficient Sample Size and Power in Multilevel Ordinal Logistic Regression Models
For most of the time, biomedical researchers have been dealing with ordinal outcome variable in multilevel models where patients are nested in doctors. We can justifiably apply multilevel cumulative logit model, where the outcome variable represents the mild, severe, and extremely severe intensity of diseases like malaria and typhoid in the form of ordered categories. Based on our simulation co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- BMC Medical Research Methodology
دوره 7 شماره
صفحات -
تاریخ انتشار 2007